A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0)

نویسندگان

  • Lukas F K Kuderna
  • Chad Tomlinson
  • LaDeana W Hillier
  • Annabel Tran
  • Ian T Fiddes
  • Joel Armstrong
  • Hafid Laayouni
  • David Gordon
  • John Huddleston
  • Raquel Garcia Perez
  • Inna Povolotskaya
  • Aitor Serres Armero
  • Jèssica Gómez Garrido
  • Daniel Ho
  • Paolo Ribeca
  • Tyler Alioto
  • Richard E Green
  • Benedict Paten
  • Arcadi Navarro
  • Jaume Betranpetit
  • Javier Herrero
  • Evan E Eichler
  • Andrew J Sharp
  • Lars Feuk
  • Wesley C Warren
  • Tomas Marques-Bonet
چکیده

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of UPQC under unbalanced and distortional load conditions: A new control method

This paper presents a new control method for a three-phase four-wire Unified Power Quality Conditioner (UPQC) to deal with the problems of power quality under distortional and unbalanced load conditions. The proposed control approach is the combination of instantaneous power theory and Synchronous Reference Frame (SRF) theory which is optimized by using a self-tuning filter (STF) and without us...

متن کامل

A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection

A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...

متن کامل

Identified Hybrid tRNA Structure Genes in Archaeal Genome

Background: In Archaea, previous studies have revealed the presence of multiple intron-containing tRNAs and split tRNAs. The full unexpurgated analysis of archaeal tRNA genes remains a challenging task in the field of bioinformatics, because of the presence of various types of hidden tRNA genes in archaea. Here, we suggested a computational method that searched for widely separ...

متن کامل

A hybrid metaheuristic using fuzzy greedy search operator for combinatorial optimization with specific reference to the travelling salesman problem

We describe a hybrid meta-heuristic algorithm for combinatorial optimization problems with a specific reference to the travelling salesman problem (TSP). The method is a combination of a genetic algorithm (GA) and greedy randomized adaptive search procedure (GRASP). A new adaptive fuzzy a greedy search operator is developed for this hybrid method. Computational experiments using a wide range of...

متن کامل

Identification of Disruptions and Associated Resilience Strategies in Blood Supply Chain Using a New Combined Approach

INTRODUCTION: Supply chains face various disruptions from human-made to natural disasters preventing proper flow of materials and products. This problem is more important in the healthcare supply chains, especially the blood supply chains, in which human lives are at risk. Making the supply chains resilient, recently addressed by managers and researchers, can be a good way to tackle them. This ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2017